Abstract…

Introduction

General introduction of the raw dataset:

Some data of squirrels in New York central park were collected starting from October 6th 2018 to 20th over a 14-day period. Some of their characteristics like ages and fur colors and some of the activities like sounds and locations were recorded.

Motivation and initial question:

Squirrels are found everywhere, and it’s observed that some places have more squirrels than others, but is there any trend of where they stay with respect to their colors, ages, activities or all other features? Doing an analysis using squirrel census data may answer the question.

Main final goal:

The main final goals of the project are to make maps according to the census and build functions to predict locations of particular squirrels if their characteristics are provided so that people can use the website to look for the kinds of squirrels they like.

Inspirations:

The maps-making process shown in class caught our eyes since it is a clear and direct way to convey the information. A website called The Squirrel Census (https://www.thesquirrelcensus.com/about ) did research on the squirrels too, but it doesn’t provide any predictions, so we aim to develop a prediction system. Also, in order to attract more people to the website, some interactive plots will be made to fully introduce the raw dataset. The census provides only the data of central park in 2018, so other datasets like data of central park in other years or data in 2018 of other places will also be collected to make comparisons of any location changes.

Method

Source:

The original raw data we used to analysis is from NYC Open Data, it includes 3,023 observations in total and some of their characteristics and corresponding locations are recorded

Preliminary work:

First off, the original raw dataset only includes the data from 2018 in central park, other two datasets were found as extra supporting materials to compare with the raw model. One dataset contains information about not only the squirrels in central park, but also in the whole new york city. Another one is about characteristics and behaviors of different animals and squirrels are also included.

Data cleaning:

For data tidy and cleaning, the categorical variables were transformed to numeric ones for analysis and model building. We didn’t discard the missing values directly but code them as 0 since there are a lot of them, and omitting them might lose the validity of the prediction. The dates were also cleaned. For other comparison plots, same data cleaning step was followed.

Model building process:

Since the outputs are both longitudinal and latitudinal, we expected to make two functions, with longitudinal and latitudinal being the outputs separately against predictors, and combine the two outputs in the end.

Results

Data summaries:

First graph we drew was ‘Number of Observations’ v.s. ‘Day’, and morning and afternoon data were separated and found out that squirrels tend to be more active in the afternoon or at night time. However, the limitation of the data is that we were not able to get the exact time period of their activities but only either morning or evening, we can assume they are present prior to sunset since they should be busy collecting the food when there is sunlight…

Some interesting plots:

Main model:

Comparions:

Discussion

Findings:

Conclusion